Research on Deep Web Query Interface Clustering Based on Hadoop

نویسندگان

  • Baohua Qiang
  • Rui Zhang
  • Yufeng Wang
  • Qian He
  • Wei Li
  • Sai Wang
چکیده

How to cluster different query interfaces effectively is one of the most core issues when generating integrated query interface on Deep Web integration domain. However, with the rapid development of Internet technology, the number of Deep Web query interface shows an explosive growth trend. For this reason, the traditional stand-alone Deep Web query interface clustering approaches encounter bottlenecks in terms of time complexity and space complexity. After further study of the Hadoop distributed platforms and Map Reduce programming model, a Deep Web query interface clustering algorithm based on Hadoop platform is designed and implemented, in which the Vector Space Model (VSM) and Latent Semantic Analysis (LSA) are employed to represent “Query Interfaces-Attributes” relationships. The experimental results show that the proposed algorithm has better scalability and speedup ratio by using Hadoop architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Web Integrated Query Interface Construction Method Based on Apriori Algorithm ⋆

Deep Web contains numerous data resources and it has been a hot topic in the database research field. There are many researches focused on Deep Web query interface discovery, form information extraction, etc. However, a very limited amount of studies are about Deep Web integrated query interface construction till now. This paper provides an integrated interface construction method based on Apri...

متن کامل

Traitor: Associating Concepts using the World Wide Web

We use Common Crawl’s 25TB data set of web pages to construct a database of associated concepts using Hadoop. The database can be queried through a web application with two query interfaces. A textual interface allows searching for similarities and differences between multiple concepts using a query language similar to set notation, and a graphical interface allows users to visualize similarity...

متن کامل

Comparison of Clustering Methods over a Hidden Web Data using Stratification

This paper’s centre of attention is on the problem of data mining (in general) and clustering (in specific) on a hidden web data. We know that data mining is a process that analyzes and extracts knowledge from large amounts of data which provides useful information to users. Hidden or deep web data is the database located at remote system .So, to access such data, we need query interface or HTM...

متن کامل

Modeling and Extracting Deep-Web Query Interfaces

Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the effectiveness of interface integration. To address this limitation, in this chapter, we model an interface ...

متن کامل

Identification and Classification of Deep Web Query Interfaces via Ontology

In order to obtain the large quantities of valuable information on Deep Web, it is required to discover the related individual query interface and design the integrated query interface on which user query request can be submitted. The key challenges are to identify and classify the Deep Web query interface accurately. In view of the regular data of Deep Web, we consider to construct the Deep We...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JSW

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014